In [2]:
from IPython.display import Image, display, Math, Latex
import os
import matplotlib
%matplotlib inline
matplotlib.use('Agg')
import matplotlib.pyplot as plt
plt.style.use('fivethirtyeight')
from datascience import *
import numpy as np
from scipy.fftpack import *
from mylib2 import *          # basic helper functions
from conclusionlib import *   # code to plot
In [2]:
Image(filename=DATA_PATH + '/img/control_group.png')
Out[2]:

Observation

  1. Profits are skewed to the left.

  2. Most profit and loss occurred in the first 20 seconds of the trade.

  3. Greater the ability for noise to return to its mean (zero) from significant deviations, greater the potential to make profits by trading futures alone.

Intuition

Predict P&L ?

Predict noise return (in time and in price) ?

Model

ϵt=(StFt)SMA180(StFt)

ϵt^=ϵμtL,tσtL,t ??

For t = i, predict min{j|sign(ϵi)(ϵiϵj)>K}

ΔF=sign(ϵi)(FjFi)

Loss={0forΔF>1.51.5ΔFforΔF1.5

In [ ]:
 
In [ ]:
 

LSTM_classifier

Past 60 seconds' (noise_180, noise_60, noise_36, noise_16, f_index_mean0) --> noise_180_return_K_time < 20s ? 1 : 0

train_classifier_X.npy, cp_classifier.ckpt, pred_classifier.npy (training set, weights checkpoint, prediction)

LAGS=120, T=180, L=120, K=2.5, P(y = 0) = 0.47

3-layer LSTM each with (32, 16, 8) units, output_activation='sigmoid', loss='binary_crossentropy', optimizer='adam'

batch_size=256, epochs=5

Training set: 34 trading days of IF1905 and IF1906

Previous 60 seconds' data when abs(noise) > 2.5, sample step = 1 tick

Testing set: 3 trading days of IF1907

P( y_obs = 1 | y_pred > 0.75 ) = 0.79

Backtesting: all 20 trading days of IF1907

Open a position when: tick = i, abs(noise_i) > 2.5 and y_pred > 0.75

Close a position when: tick = j, sign(noise_i) * (noise_i - noise_j) > K

Position remains constant in interval i, j

In [3]:
DATA_PATH = 'C:/Users/admin/Desktop/Machine Learning'

pred = np.load(DATA_PATH + '/pred_classifier.npy') # predictions
obs = np.load(DATA_PATH + '/valid_classifier_y.npy') # observations

ones = [] # y_obs = 1
zeros = [] # y_obs = 0
for k in range(len(pred)):
    if obs[k][0] < 0.01:
        zeros.append(pred[k][0])
    elif obs[k][0] > 0.99:
        ones.append(pred[k][0])
In [4]:
plt.figure(figsize=(16, 4), dpi=80)
plt.hist(pred, bins=np.arange(0, 1.01, 0.01))
plt.xlabel('y_pred')
plt.ylabel('Frequency')
plt.show()
In [5]:
plt.figure(figsize=(16, 4), dpi=80)
plt.hist(zeros, bins=np.arange(0, 1.01, 0.01), alpha=0.6)
plt.hist(ones, bins=np.arange(0, 1.01, 0.01), alpha=0.6)
plt.xlabel('y_pred')
plt.ylabel('Frequency')
plt.legend(['y_obs = 0', 'y_obs = 1'])
plt.show()
In [6]:
plot_pnl('20190715', '/test_result_classifier_')
20190715
noise_180 avg = -0.031312, stddev = 1.657712
Total P&L = -8.700000, Trade Counter = 37, avg = -0.235135
true_positive = 0.864865
false_negative = 0.000000

In [7]:
plot_pnl('20190715', '/test_result_classifier_dense_')
20190715
noise_180 avg = -0.031312, stddev = 1.657712
Total P&L = -16.500000, Trade Counter = 547, avg = -0.030165
true_positive = 0.943327
false_negative = 0.000000

In [8]:
Image(filename=DATA_PATH + '/img/LSTM_classifier_sample_L_1.png')
Out[8]:
In [9]:
Image(filename=DATA_PATH + '/img/LSTM_classifier_sample_L_2.png')
Out[9]:
In [10]:
Image(filename=DATA_PATH + '/img/LSTM_classifier_sample_W_1.png')
Out[10]:
In [11]:
Image(filename=DATA_PATH + '/img/LSTM_classifier_sample_W_2.png')
Out[11]:
In [12]:
Image(filename=DATA_PATH + '/img/LSTM_classifier_sample_W_3.png')
Out[12]:
In [ ]:
 
In [ ]:
 

LSTM_regression

train_regression_X.npy, cp_regression.ckpt, pred_regression.npy (training set, weights checkpoint, prediction)

Past 60 seconds' (noise, noise_60, noise_36, noise_16, f_index_mean0) --> avg(noise_i : i+20) - noise_i

LAGS=120, T=180, L=120, K=2.5
3-layer LSTM each with (32, 16, 8) units, output_activation='linear', loss='mse', optimizer='adam'
batch_size=256, epochs=5

Training set: IF1905 and IF1906

abs(noise) > 2.5, sample_step = 1

Testing set: IF1907

coefficient_of_determination = 0.31

Backtesting

Open a position when: tick = i, abs(y) > {0.75, 1, 1.5, 2}

Close a position when: tick = j, sign(y) (noise_j - noise_i) > 2 abs(y)

Position remains constant in interval i, j

In [13]:
pred = np.load(DATA_PATH + '/pred_regression.npy')
obs = np.load(DATA_PATH + '/valid_regression_y.npy')

r = find_r_of(pred, obs)
print('correlation coefficient = %lf\ncoefficient of determination = %lf' % (r, r ** 2))
plt.figure(figsize=(16, 9), dpi=100)
plt.xlabel('y_pred')
plt.ylabel('y_obs')
plt.scatter(pred, obs, 1.5)
plt.plot([-2, 4], [0, 0], lw=1, color='r')
plt.show()
correlation coefficient = 0.555329
coefficient of determination = 0.308390
In [14]:
plt.figure(figsize=(16,4), dpi=80)
plt.hist(pred, bins=np.arange(-5, 5, 0.1))
plt.show()
In [15]:
plt.figure(figsize=(16,4), dpi=80)
plt.hist(obs, bins=np.arange(-5, 5, 0.1))
plt.show()
In [16]:
plot_pnl_regression('20190628', '/test_result_regression_')
20190628
noise_180 avg = -0.039187, stddev = 1.187842
3.800000000000182 5 0.7600000000000364
36091.0 36155.5
4.5 2.199999999999818
prediction = 1.001791, observation = 1.367177
36096.0 36179.5
23.5 -1.099999999999909
prediction = 1.102468, observation = -0.414785
36972.5 37037.0
4.0 1.2999999999997272
prediction = 1.061036, observation = 1.403812
37724.5 37798.5
13.0 1.2000000000002728
prediction = 1.027617, observation = 1.054508
40309.0 40380.5
6.0 0.20000000000027285
prediction = 1.007008, observation = 1.755296

In [6]:
plot_pnl_regression('20190628', '/test_result_regression_DENSE_', detailed=False)
20190628
noise_180 avg = -0.039187, stddev = 1.187842
24.000000000001364 29 0.8275862068965988
In [ ]:
 
In [ ]:
 

Further Considerations

  1. Consider Fi,i+20¯Fi , instead of FjFi which has fairly limited predictability.

  2. Adjust X.shape = (LAGS, FEATURES), with longer lags in time but unevenly distributed ticks. Could also consider more features.

  3. How to use trading volume data? Turning F_t to F_v reduces noise in futures price fluctuation.

  4. How to use the value of basis?

  5. *** How to adjust position during trading more strategically, representing an estimation of the likelihood of noise return.

  6. How to implement asymetrical loss function? (Missing a profitable opportunity is better than making a failing trade, especially now P&L has a considerable left-skewed distribution)

  7. Replace ϵt in X to ϵt^ , but correlation with Ft will reduce.

  8. Predict ΔMA

  9. Replace y in LSTM_regression with sign(ϵi)(ϵi,i+20¯ϵi)

  10. The logic to generate samples and to backtest.

In [ ]: